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ABSTRACT 


In E-commerce, a widely used strategy to improve the customer shopping experience and address 
information overload is the implementation of recommendation systems. Many E-commerce platforms have 
their proprietary recommendation algorithms, with content-based filtering being a commonly employed 
approach. This algorithm provides non-personalized suggestions to users based on the similarity of content. 
The shopping experience involves the decision-making process, with a significant focus on information 
search. In online shopping, customers heavily rely on information search to gain in-depth insights into 
products, as physical interaction is not possible. Customer reviews play a crucial role in this information 
search, offering shoppers the opportunity to learn from the experiences of previous customers. Despite the 
importance of customer reviews, existing recommendation solutions often overlook this aspect in their 
product recommendations. To address this gap, sentiment analysis, a natural language processing task 
frequently used in reviews, is employed to classify, and quantify text based on its polarity. This research 
introduces a recommendation pipeline that combines content-based filtering, utilizing cosine similarity 
calculations, and sentiment analysis, utilizing a pre-trained RoBERTa language model. The focus is on 
quantifying customer reviews from an E-commerce platform in Malaysia. The goal of this research is to 
develop an embedded system that recommends products to users based not only on their similarity but also 


on high ratings from various E-commerce sites. 


Keywords: Sentiment Analysis, Content-Based Recommendation, Sentiment-Based Ranking, Product 


Review Analysis 


1. INTRODUCTION 


The evolution of commerce has 
undergone a_ remarkable transformation, 
sculpting a landscape’ characterized by 
heightened _ accessibility and seamless 
communication between vendors and consumers. 
This metamorphosis revolves around the concept 
of electronic commerce or e-commerce, wherein 
business operations harness the power of the 
internet and information technology. The 
expeditious growth of e-commerce is propelled 
by influential factors like mobile technology and 
internet connectivity, ushering in an era where 
transactions effortlessly transcend geographical 
boundaries [1]. 


The recent global pandemic, COVID- 
19, has acted as a catalyst, expediting this trend. 
It has compelled individuals and businesses to 
embrace online platforms, driven by _ the 


constraints posed by _ physical limitations. 
Moreover, e-commerce offers a dynamic 
platform for small and medium-sized enterprises 
to expand their market reach by establishing a 
digital footprint. 


Nonetheless, despite its advantages, 
various challenges emerge from online shopping, 
notably, information overload. This occurs when 
a consumer has access to an abundance of 
readily available information so it will be 
difficult for them to make an informed decision. 
Furthermore, online shopping is deemed to be 
time-consuming. Research done by [2] held a 
survey and discovered that 43% of the customers 
spent more than 7 hours a day and 26% of the 
customers spent 4-6 hours a day scrolling 
through e-commerce websites regardless of 
whether they ended up purchasing the item or 
not. 65% of them also stated to open e-commerce 
websites more than 10 times a day. Decision- 
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making is a big part of the process undergone by 
consumers before purchasing a product. An 
activity that needs to be highlighted is the 
information search process, which is commonly 
the most time-consuming part of online 
shopping, having to collect information about the 
product. [3] argued that online shopping takes 
more time and effort even compared to visiting 
physical shops due to this stage of decision- 
making having to constantly search for new 
information and ensure the choices they have 
made. 


Enhanced by empirical evidence, 
customer reviews emerge as a_ pivotal 
determinant influencing purchasing decisions. 
As substantiated by [4], the mere inclusion of an 
item's ratings on a page yields a remarkable 
surge in sales, witnessing a notable increase of 
12.97%. Moreover, the significance of written 
reviews should not be underestimated, as they 
play a crucial role in instilling confidence in 
potential buyers. This is underscored by the 
acknowledgement from [5] that consumers 
heavily rely on the experiences of others when 
making purchasing decisions, with a particular 
emphasis on negative encounters to mitigate 
potential risks. 


Numerous strategies have been devised 
to enhance the customer decision-making 
process. Among these, recommendation systems 
are frequently employed to address challenges 
associated with information overload, utilizing 
sophisticated algorithms to forecast user 
preferences [6]. Many e-commerce platforms 
boast proprietary, in-house algorithms capable of 
delivering personalized suggestions to users. 
Nevertheless, these solutions often overlook 
inputs from alternative e-commerce sources, and 
textual reviews are not given due consideration. 
Additionally, a multitude of endeavors have been 
undertaken to facilitate cross-website product 
comparisons, manifesting in the development of 
price comparison systems. 


While there has been a lack of 
automated solutions considering the diverse 
range of consumer purchase experiences, their 
influence on consumer decisions cannot be 
understated. Innovative technologies, particularly 
Artificial Intelligence (AI) and Natural Language 
Processing (NLP), hold the potential to alleviate 
information overload. NLP, a subset of AI, can 
recommend listings by analyzing textual data, 


including product descriptions and feedback. 
This enables content-based filtering and 
sentiment analysis, streamlining the decision- 
making process. 


Despite the touted benefits of e-commerce, 
such as timesaving, research suggests that online 
shopping can be time-consuming, with users 
spending significant hours browsing. Decision- 
making in online’ shopping, particularly 
information search, is highlighted as a major 
time-consuming activity. Reviews play a crucial 
role in purchasing decisions, with positive 
ratings and detailed feedback boosting 
confidence, although an overabundance of 
positive reviews can raise suspicion due to 
potential manipulation. Online shopping is 
argued to require more time and effort than 
physical stores due to the constant need for 
information search and decision-making. Various 
solutions have been’ proposed, including 
personalized recommendations and price 
comparison systems, but there's a lack of 
automated solutions _—_— considering _ other 
consumers’ experiences, which significantly 
influence purchasing decisions. 


In this investigation, a sophisticated 
recommendation system has been designed to 
suggest consumer products. The system relies on 
a dual approach, leveraging content similarity 
and sentiment analysis of written reviews to 
pinpoint the most highly rated listings. By 
incorporating sentiment analysis, the algorithm 
not only considers the inherent characteristics of 
the products (content similarity) but also factors 
in the sentiments expressed in user reviews. As 
the user's presently viewed listing serves as the 
input, the algorithm adeptly identifies and 
recommends analogous products by scrutinizing 
the listing's title and description. This method 
ensures a nuanced and comprehensive 
recommendation process, enhancing the overall 
user experience. 


2. LITERATURE REVIEW 


Prioritizing recommendations in e- 
commerce, [7] underscored various elements 
within a Content-based |§ Recommender 


Information Filtering architecture. The aim was 
to streamline the necessary processes inherent in 
content-based filtering. When delving into the 
analysis of unstructured data, a precise pre- 
processing method becomes imperative to 
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transform it into structured data for subsequent 
processing, specifically for feature extraction. 
Multiple techniques are available for feature 
extraction, encompassing concepts, keywords, 
and n-grams. 


In recommender systems, personalized 
recommendations have become the standard. 
However, there is a notable exception in the e- 
commerce domain, where non-personalized 
recommendations are predominantly employed, 
particularly on an e-commerce homepage where 
standardization is key, leading to identical 
outcomes for all users. The exploration of non- 
personalized recommendation implementations 
in recommender systems is thoroughly examined 
in the study conducted by [8]. This study 
emphasizes two widely used algorithms in this 
context: the aggregated opinion approach and the 
basic association recommender. 


A separate investigation by [9] aimed to 
deliver both personalized (UBCF) and _ non- 
personalized (IBCF) recommendations for top N 
movies. The non-personalized recommendations 
leveraged the k-fold algorithm, wherein users 
were randomly assigned to different groups. 
Subsequently, a weighted sum was calculated to 
generate recommendations, with ratings serving 
as weights following similarity computations 
between the input and target movies. 


Concerning sentiment analysis, 
Lexicon-based sentiment analysis stands out as a 
widely embraced methodology. It involves 
gauging the sentiment of a text or document by 
ascribing sentiment scores to individual words 
and subsequently amalgamating them. In this 
context, [10] devised a human sentiment analysis 
model capable of conducting sentiment analysis 
across diverse domains, eliminating the necessity 
for specific domain expertise. The model 
operates on a dictionary-based lexicon approach, 
leveraging the SentiWordNet tool. This tool 
encompasses an extensive vocabulary _ list, 
attributing sentiment scores to each word, such 
as "excellent: +3". Following the assignment of 
scores to each word within the text, the 
overarching polarity score of the lexicon is 
derived by summing the scores of all words 
present in the document. 


A study conducted by [11] delved into 
an analysis of various pre-trained transformer 
models. Among the transformer pre-trained 
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models scrutinized were ULMFiTm 
Transformer, GPT-2, BiGRU, BERT, 


Transformer-XL, and XLNet. The evaluation 
focused on BERT and BiGRU, the two most 
recent models, utilizing IMDb movie review 
datasets to gauge their performance. In the 
pursuit of determining dataset polarity, the 
researchers recognized XLNet as an 
outperformer compared to other models. 
However, it was noted that XLNet demands 
substantial computational complexity, 
necessitating robust hardware for optimal 
performance. Despite this, the examination of the 
two models revealed that BERT exhibited 
superior accuracy, registering an impressive 
0.904, as opposed to BiGRU, which achieved an 
accuracy of 0.7206. 


BERT has emerged as a _ prevalent 
choice for NLP tasks, particularly in the realm of 
opinion mining, owing to its remarkable 
accuracy in sentiment classification. This study, 
inspired by [12], delves into the application of 
BERT for sentiment analysis. The model 
underwent fine-tuning using an unlabeled dataset 
sourced from Indonesian Mobile Applications, 
acquired through web scraping from the Google 
Play website. Two BERT models were employed 
in this study namely, the multilingual BERT- 
Base and IndoBERT Base models. 


In a study conducted by [13], a mobile 
application was developed to conduct real-time 
sentiment analysis on product reviews. 
Specifically designed for e-commerce platforms, 
the researchers utilized datasets extracted from 
Amazon reviews. Employing SVM techniques, 
the system efficiently classified reviews into 
positive and negative categories, leveraging one 
of the simplest algorithms known for its high 
accuracy. The interface visually represented the 
distribution of positive and negative reviews in a 
clear listing. Notably, this system did not 
incorporate the consideration of neutral reviews. 
The model demonstrated a commendable 
93.54% Fl score, indicating a high level of 
accuracy in its predictions. This research holds 
relevance to our project, as it shares a similar 
objective of implementing real-time analysis on 
product reviews sourced directly from e- 
commerce platforms. It serves as a valuable 
reference, guiding our approach _ while 
acknowledging that our system will diverge in its 
functionality. Unlike the mentioned study, our 
intended system aims to provide 
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recommendations or rankings based on _ the 
polarity of the reviews, adding a_ unique 
dimension to the analytical process. 


In a study conducted by [14], a 
comprehensive framework was developed for 
food recommendation based on_ sentiment 
analysis. This sophisticated framework 
encompasses the classification of both food and 
sentiments. Each food item is categorized into 
one of four distinct courses, allowing for a 
nuanced analysis. Subsequently, the system 
identifies the most favourably reviewed food 
items based on the sentiments expressed in the 
reviews. To gauge sentiments, the reviews are 
evaluated using an AFINN scoring system, 
which spans from -5 to 5. The research explored 
various classification models, including Support 
Vector Machines (SVM), Random Forest, 
Logistic Regression, and AdaBoost. Notably, 
SVM demonstrated superior performance in 
sentiment classification, achieving an impressive 
98.9% validation accuracy. In addition to the 
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advanced classification models, the Bag of 
Words technique was employed for feature 
extraction. This technique contributed to the 
model's ability to distil essential information 
from the reviews. The outcome of this process 
was a top-n list of recommended foods for each 


category. It's noteworthy that the model 
primarily relied on the Bag of Words technique 
for feature extraction, resulting in a 


recommendation system that, while effective, is 
non-personalized. Future enhancements could 
explore incorporating more personalized features 
to tailor recommendations to individual 
preferences. 


3. SYSTEM OVERVIEW 


The system incorporates an embedded 
framework designed with the specific aim of 
suggesting users superiorly reviewed products, 
tailored to their selected preferences. 


Batch Processing 


Recommendation Pipeline 


Text 
pre-processing 
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calculation 
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similarity) 
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sentiment 
score 
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best oroduct 


Selected product 


Products list 


Web 
scraping 


Reviews and Product lest (raw) 


Sentiment Classification 
Pipeline 


Pre-processed data 


and sentiment score 


Polarity score 


SSS eS 


Sentiment 
score 
calculation 


| 


E-commerce Site 


= 


Figure 1: Proposed System Design 


Fig. 1 elegantly illustrates the essential 
components and data flow intricacies within the 
system. Prominent elements include the 
sentiment classification pipeline, 


recommendation pipeline (API), database, web 
scraping algorithm, and website. The data flow 
commences at the E-commerce site, where 
information is meticulously gathered through 


4199 


Journal of Theoretical and Applied Information Technology 


“oa 
15" May 2024. Vol.102. No 9 Ww 

© Little Lion Scientific ir, 

ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 


web scraping and then meticulously stored in the 
database. 


The raw data undergoes a meticulous 
transformation via the sentiment classification 
pipeline, involving text pre-processing, 
sentiment classification, sophisticated scoring 
using a ROBERTa-based model, and subsequent 
score aggregation. Reviews are systematically 
processed in batches before seamlessly 
reintegrating into the database. This meticulously 
curated database serves as a cornerstone for the 
recommendation process, orchestrated by the 
Flask API housing the recommendation pipeline. 


Initiating the recommendation process 
is a user's input of a specific product, setting in 
motion a series of intricate algorithms. The 
refined outcomes of this process are thoughtfully 
relayed back to the system's website, culminating 
in a seamlessly integrated and enriched user 
experience. 


4. METHODOLOGY 


4.1 Initial Research 

The requirement validation process was 
meticulously executed through the 
administration of a survey using Google Forms, 
engaging actively with a cohort of 36 
respondents. The survey itself was thoughtfully 
organized into three distinct sections, each 
strategically crafted to elicit comprehensive 
insights. 


In the initial segment, participants 
graciously shared their demographic details, 
thereby enriching our understanding of their 
diverse backgrounds. Shifting our focus to the 
subsequent section, we delved into the expansive 
realm of online shopping, aiming to grasp 
participants' overarching perceptions and 
experiences within this digital landscape. The 
concluding section, a meticulous exploration of 
consumer behaviour and attitudes regarding 
customer reviews, played an indispensable role 
in affirming the development and pertinence of 
the proposed system. The survey, comprising 21 
meticulously formulated questions, was designed 
to meticulously unravel the tapestry of 
participants’ perspectives and opinions. 


4.2 Data Collection 
The data is meticulously gathered from 
an eCommerce platform in Malaysia, specifically 


focusing on the men's bag category. This 
comprehensive collection encompasses both 
product details and customer reviews. The 
resultant dataset is thoughtfully organized into 
two distinct categories: one housing product- 
related information and the other dedicated to 
customer reviews. The collected dataset is as 
follows: 


Product Dataset 
¢ Product name 
¢ Product rating 
¢ Number of ratings 
¢* Number of products sold 
¢* Shop name 
¢* Product URL 
* Product description 
¢ Product price 
¢ Product Image URL 


Review Dataset 
e Review time 
¢* Product variant 
e Review content 


The information was_ systematically 
gathered in a series of five batches, with each 
batch comprising approximately 60 rows of 
products and 7300 rows of reviews. 


4.3 Data Pre-processing 

In the process of sentiment scoring and 
classification, a  pre-trained model was 
employed, and a set of minor pre-processing 
steps were applied primarily to standardize the 
scraped data. 


4.3.1 Data Cleaning 

Refining data through the meticulous 
identification and correction of noise, 
inconsistencies, and _—_— inaccuracies. This 
comprehensive process encompasses the removal 
of duplicates, handling missing values, 
standardization, and other refinements to ensure 
data integrity and reliability. 


4.3.2 Text Translation 

Harnessing the power of translation 
APIs to seamlessly convert textual data from a 
designated language into another specified 
language. 


4.3.3 Data Transformation 
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The transformation process involves 
alterations in units or data types within 


designated columns. 


4.3.4 Data Integration 

Creating a harmonious synthesis of 
information gathered from various origins, 
amalgamating diverse data sources to form a 


cohesive whole. 


4.4 Text Encoding and Sentiment Scoring 
This section involves the text encoding 
process to transform texts into a format that is 
suitable to be processed by the model and the 
sentiment classification task using a pre-trained 
transformers model retrieved from HuggingFace. 


In this section, we delve into the 
intricacies of the text encoding process, a crucial 
step in transforming textual data into a format 
conducive to model processing. Our approach 
involves leveraging a pre-trained transformers 
model sourced from HuggingFace. 


The linchpin of our methodology is the 
Multilingual XLM-roBERTa-base model, an 
exemplary pre-trained model fine-tuned on a vast 
corpus of approximately 198 million tweets. This 
fine-tuning process, orchestrated by [15], 
specifically tailors the model for sentiment 
analysis across multiple languages. It is 
noteworthy that this adept model extends its 
support to languages such as English and 
Mandarin, rendering the need for a Malay to 
Mandarin translation superfluous. 


The key instrument in this encoding 
endeavour is the AutoTokenizer, a dynamic 
component drawn from the pre-trained model. 
This tokenizer plays a pivotal role in encoding the 
textual data, paving the way for subsequent 
processing by the model. Following the encoding 
process, the model seamlessly generates 
sentiment scores, providing valuable insights into 
the sentiment of the text under scrutiny. The 
results of the analysis in a dictionary form are as 
follows: 


{negative: “negative probability score’, 
positive: “positive probability score”, 
neutral: “neutral probability score’”’} 


The sentiments in the reviews are 
categorized based on their highest probability 
scores, creating a refined classification. The 
sentiment distribution is shown in Fig. 2 along 
with the sentiment word clouds as shown in Fig. 
3,4, &5. 
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Figure 2: Polarity Distribution of Product Reviews 


Figure 3: Word Cloud of Negative Category Reviews 


Figure 4: Word Cloud of Positive Category Reviews 


Figure 5: Word Cloud of Neutral Category Reviews 


4.5 Score Accumulation and Mapping 
Following the retrieval of sentiment 
distribution scores, these scores undergo 
consolidation into a singular metric. Positive 
reviews contribute a score of 2 points, negative 
reviews incur a score of -2 points, and neutral 
reviews are assigned 1 point. Subsequently, the 
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cumulative sum of these values is computed. The 
equations are as follows: 


2 pos +1neu—2neg 


A new column was formed to contain 
the accumulated value of the sentiment scoring. 
The final shape of the review dataset in a 
dictionary form is as follows: 


{review_ id: “review unique 1d”, 


item_id: “id of product review are associated 
with”, 


review_date: ’date where review was made”, 
variant: “variant of product review refers to”, 
review_content: ’review written content”, 
polarity score: ’score generated from model”, 


review_sentiment: “sentiment classification of 
review’, 


accu_polarity score: aggregation of polarity 
score” 


After the scores had been accumulated, based 
on the ‘item id’, the reviews were grouped and 
‘accu_polarity score’ values were aggregated 
where the average score was taken to be mapped 
to each product in the product dataset. The final 
shape of the review dataset in a dictionary form 1s 
as follows: 


{ 
item_id: "product unique id", 
item_name: "product title", 
item_rating: "overall ratings of product", 
rating num: "number of product ratings", 
item_sold: "number of product sold", 


shop name: "name of shop that sells the 
product", 


item_url: "URL of Shopee page", 
item_ description: "product description", 
item_ price: "product price", 


accu_score: "aggregated polarity score from 
reviews dataset", 


item_image: "image URL of item" 


j 


4.6 Content-based Recommendation 

In this section, we delve into the 
methodology of content-based recommendation, 
aiming to unearth similarities among products by 
analyzing both product titles and descriptions. 
By leveraging the intrinsic content of these 
elements, we seek to enhance the precision of 
our recommendation system, providing users 
with more relevant and personalized suggestions. 


4.6.1 Text Cleaning 

The procedure involves eliminating punctuation 
and special characters, as well as standardizing 
the text. 


4.6.2 Text Encoding 
The texts transformed into formats conducive to 
processing as a TF-IDF Matrix. 


Cosine Similarity: A prevalent method for 
computing similarities between vectors, 
specifically the values within the generated TF- 
IDF Matrix, is employed as the common 
similarity calculation measure. The formula for 
Cosine Similarity calculation is in equation (1). 


SA 


1=1 


= (1) 
>», B} 
i=l 


The outcome is expressed as a 
numerical value ranging from 0 to 1, where 
proximity to 1 signifies a higher degree of 
similarity, whereas proximity to 0 indicates 
greater dissimilarity. 


4.6.3 Similarity and Sentiment Ranking 
The generated results consist of the top N most 
similar items, meticulously re-sorted according 
to their accumulated polarity scores, creating a 
refined and aesthetically pleasing presentation of 
the findings. 


5. IMPLEMENTATION 
5.1 Sentiment Analysis Pipeline 


Following the retrieval of the data, an 
intricately designed sentiment analysis pipeline 
was established. This pipeline incorporated a 
systematic approach, conducting processing and 
scraping in well-organized batches to mitigate 
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the potential for errors during the intricate 
processing stages. 


Upon the successful retrieval of data, 
organized into a comprehensive CSV file 
through a series of meticulous batches, a 
thorough exploration ensued. This exploratory 
phase encompassed essential tasks, such as 
identifying any instances of missing or duplicate 
data, ascertaining the dimensions of the dataset 
by meticulously counting rows and columns, and 
employing suitable functions to discern the data 
types across all columns. 


After completing the initial checks, we 
proceeded with essential pre-processing steps. In 
addition to fundamental tasks like eliminating 
duplicates, addressing missing values, and 
transforming data types, we meticulously pre- 
processed the textual content within the reviews. 
This involved tasks such as removing symbols 
and characters, as well as rectifying irregular 
spacing. These actions were executed using 
regular expression (regex) functions to ensure 
the refinement of the text data. 


After completing the cleaning phase, we 
utilized the Google Translate API to seamlessly 
translate all Malay reviews into English. 
Notably, some nuances, like abbreviations and 
informal language such as_ slang, posed 
challenges for the API, necessitating manual 
translation by our developers for these specific 
terms. 


It's important to highlight that we opted 
against introducing an additional translation 
layer, even though the text retained Chinese 
characters. This decision was informed by the 
fact that our sentiment scoring and classification 
model underwent specialized training in various 
languages, encompassing Mandarin and English. 
This proficiency empowered the pre-trained 
model to adeptly navigate and interpret the 
intricacies of these language blends. 


Upon generating sentiment scores, a 
novel metric emerged through the application of 
the formula. This metric signifies the cumulative 
sentiment scoring. Subsequently, the mean of 
this metric, organized by item ID, was correlated 
with their corresponding products, introducing a 
fresh attribute into the product dataset. This 
attribute assumes significance as a_ pivotal 
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determinant within the recommendation pipeline, 
enriching the decision-making process. 


At last, the various batches of review 
and product datasets were meticulously 
consolidated into distinct JSON-type files, ready 
for seamless importation into the database. 


5.2 Recommendation Pipeline 

The recommendation pipeline 
seamlessly became part of the API, enhancing 
user experience by calculating similarity from 
input data. Just as a meticulous pre-processing 
stage precedes sentiment analysis, the text 


underwent refinement. Employing — regular 
expressions, special characters and punctuation 
gracefully vanished. An _ assortment of 


punctuation marks, meticulously curated from 
the string library, found their place in a dedicated 
list. Through the artistry of regex once more, this 
list orchestrated the removal of punctuation from 
the text. A finishing touch of elegance ensued as 
alphabet case standardization — gracefully 
transformed all text into lowercase. 


Following the initial encoding step, the 
text transformed TF-IDF matrices through the 
utilization of the TfidfVectorizer function. This 
facilitated the creation of a comprehensive 
similarity matrix, housing correlation values 
between the TF-IDF matrices. The subsequent 
arrangement of this matrix in a descending order 
allowed for the extraction of the top N products 
manifesting the highest similarities. Given that 
the sorting of similarity matrices was conducted 
by the product index, each matrix found its 
correlation with the corresponding items. 
Consequently, when a specific product was 
designated as the input, the system seamlessly 
referenced the associated similarity matrix to 
derive the top 10 products exhibiting the greatest 
resemblance. 


Further refinement ensued, with the 
selection process homing in on the top sentiment 
scores of these 10 products. Ultimately, 5 items 
were cherry-picked based on their superior 
sentiment rankings, culminating in a more 
streamlined and refined output. 


5.3 Database Development 

After generating the JSON file, it was 
imported into MySQL. SQL queries were then 
utilized to adjust the data types accordingly. The 
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database was seamlessly connected to the 
backend API through the flask_mysqldb library. 


5.4 Web Development 

The web development of this project 
involves the development of the backend and 
front end. The backend was developed using the 
Python Flask framework where it could run 
locally in port 5000. The flask API consisted of 
three routes: /main, /input, and /result. /main 
takes all products from the database and returns 
them to the front end as options for the users. 
/input retrieves the user’s selection and processes 
them through the recommendation pipeline. 
/result retrieves results calculated by the 
recommendation pipeline or /input route and 
returns them in JSON format to the front end. 


HTML. CSS and JavaScript were 
otherwise used to develop the website’s front end 
using the ReactJS framework to ease the 
development of repetitive containers using the 
component concept in ReactJS. The React App 
was run in port 3000. The React App however 
needs to communicate to the backend API 
running on port 5000 where it has to be defined 
in the package.json file by adding the following 
line: 


"proxy": "http://localhost:5000", 


Two pages were made for the website 
such as inputPage.js and resultPage.js as shown 
in Fig. 6, 7, 8 & 9. Where inputPage consisted of 
all available products in the database for users to 
select. Users then be directed to resultPage where 
the recommendation results are shown to the 
user. 


Figure 6: Site Main Page (1) 


Figure 7: Site Main Page (2) 


Figure 9: Site Result Page (2) 


The validation process for the system 
comprised two main testing approaches: unit 
testing and user acceptance testing. Unit testing, 
conducted by the developer, involved assessing 
each functionality to ensure proper operation. 
The results showed successful interactions and 
all components working as expected, with no 
encountered bugs. For user acceptance testing, 
students from different academic backgrounds 
were chosen to evaluate the system. They 
conducted a _ thorough assessment of 
functionalities, identified potential bugs, 
measured system speed, and rated the user 
interface and overall experience. Overall, the 
participants were satisfied with the system's 
functionality, reported no bugs, and found the 
system's performance speed satisfactory. 


Participants found the interface navigable and 
user-friendly, giving positive feedback on its 
design. However, two participants suggested 
improving the user experience by adding 
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clickable containers alongside product titles for 
easier selection. Additionally, one participant 
suggested incorporating initial page instructions 
for better user orientation, particularly for users 
with an engineering background. Despite these 
suggestions, the system was _ considered 
satisfactory and successful by the users. They 
recognized the relevance and usefulness of the 
system's recommendations in addressing issues. 
Participants expressed a desire for additional 
features to provide more comprehensive analyses 
and product review details. In conclusion, user 
acceptance testing validated the system's 
functionality, user interface, and relevance, 
setting a positive path for further improvements 
based on valuable user feedback. 


Two common techniques such as content- 
based filtering and collaborative filtering, are 
typically used in recommendation systems. 
However, implementing collaborative filtering 
faces challenges due to limited publicly available 
user behavior data. To address this, a simplified 
approach employing item-based collaborative 
filtering, like content-based filtering, will be 
utilized. 


This system aims to complement existing e- 
commerce platforms rather than replace them, 
thus focusing on integrating seamlessly into 
users' browsing experiences. To achieve this, the 
solution will be implemented as a Chrome 
extension, automatically extracting data from 
visited pages to provide recommendations 
without requiring manual input from users. This 
integration not only enhances user-friendliness 
but also streamlines the recommendation process 
by eliminating the need to navigate to a separate 
platform. 


6. CONCLUSION AND FUTURE 
ENHANCEMENT 


This endeavor confronts the growing 
hurdles presented by the swift growth of the e- 
commerce sector. Chief among these challenges 
are the overwhelming amount of information and 
the time constraints faced by users. To address 
these issues, a recommendation system has been 
implemented. Nevertheless, there remains a 
significant gap in understanding the significant 
impact of written reviews on the effectiveness of 
these recommendations and their subsequent 
influence on consumer purchasing choices. To 
bridge this crucial gap, an innovative system has 
been devised. This system places its emphasis on 
delivering real-time recommendations, 
employing a groundbreaking methodology that 
incorporates the comprehensive sentiment score 
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extracted from product listings. This refined 
approach seeks to elevate the user experience 
and streamline the decision-making journey for 
consumers navigating the ever-evolving realm of 
e-commerce. 


Through a thorough exploration of 
relevant literature and existing systems, this 
project aims to explore novel pathways to enrich 
the shopping experience by integrating 
recommendation systems and sentiment analysis. 
The implementation phase of this endeavor 
focuses on designing the system architecture, 
formulating the project plan, executing the 
implementation, and validating the effectiveness 
of the system. The author has carefully crafted a 
system architecture, outlining the vital 
components necessary for achieving the system's 
objectives. Following the implementation of the 
code, a comprehensive project plan is crafted to 
meticulously document the various iterations of 
the system and orchestrate a systematic launch. 
This plan includes a blueprint for evaluating the 
system's performance. Subsequently, after the 
development of the system's code, a rigorous 
testing and evaluation phase is_ initiated, 
involving three participants in the process. 


In the face of time limitations, the 
project encountered difficulties in broadening its 
range of features, especially when it came to 
conducting in-depth review analyses. The 
challenge primarily stemmed from a lack of 
readily available data regarding product specifics 
and reviews. This scarcity of accessible 
information led to a constrained selection of 
labeled datasets, which in turn impeded the 
crucial process of refining the model through 
fine-tuning. Without a sufficient pool of labeled 
data, it became challenging to thoroughly assess 
the effectiveness of the model. 


Despite these obstacles, it's important to 
highlight that overcoming this limitation remains 
a possibility for future improvements. There is 
potential for addressing this issue through further 
exploration and enhancements in the project. 
This could involve delving deeper into the 
available data sources, developing strategies to 
collect more labeled datasets, or employing 
alternative methodologies to improve the model's 
performance and effectiveness in analyzing 
reviews and product details. 
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Prospective improvements to the system 
should prioritize the enhancement of the user 
interface, aiming for a more polished and 
intuitive design. Clearer descriptions should be 
incorporated to elevate user-friendliness, as 
recommended by one of the participants. 
Furthermore, introducing a comprehensive 
dashboard for in-depth analysis of product 
reviews will contribute advanced features to the 
system. This dashboard is envisioned to 
empower users with deeper insights into 
sentiment analysis results and product feedback. 


To further elevate the system's 
performance and reliability, it is imperative to 
integrate a meticulous data labelling process. 
This involves methodically assigning sentiment 
labels to the data, a pivotal step that will 
undoubtedly contribute to the refinement of a 
more resilient and precise sentiment analysis 
model. Additionally, incorporating a user-centric 
feature to evaluate the system's performance 
would be a commendable enhancement. This 
user feedback mechanism catalyzes the perpetual 
refinement of recommendation algorithms, 
ensuring their alignment with the ever-evolving 
preferences and requirements of users. 


Furthermore, for heightened user 
convenience, consider adapting the deployment 
method to create a sophisticated Chrome 
extension. This strategic modification not only 
streamlines user access but also seamlessly 
integrates the recommendation system into the 
user's browsing experience on e-commerce sites. 
By doing so, the overall shopping experience is 
elevated, and the system's accessibility is 
substantially increased. This innovative approach 
not only fosters a more _ user-friendly 
environment but also positions the system as a 
cutting-edge solution in the realm of 
personalized recommendations. 
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